A Performance Prediction Framework for Data Intensive Applications on Large Scale Parallel Machines
نویسندگان
چکیده
This paper presents a simulation-based performance prediction framework for large scale data-intensive applications on large scale machines. Our framework consists of two components: application emulators and a suite of simulators. Application emulators provide a parameterized model of data access and computation patterns of the applications and enable changing of critical application components (input data partitioning, data declustering, processing structure, etc.) easily and exibly. Our suite of simulators model the I/O and communication subsystems with good accuracy and execute quickly on a high-performance workstation to allow performance prediction of large scale parallel machine con gurations. The key to e cient simulation of very large scale con gurations is a technique called loosely-coupled simulation where the processing structure of the application is embedded in the simulator, while preserving data dependencies and data distributions. We evaluate our performance prediction tool using a set of three dataintensive applications.
منابع مشابه
Efficient Performance Prediction for Large-Scale, Data-Intensive Applications
This paper presents a simulation-based performance prediction framework for large-scale, data-intensive applications on large-scale machines. The framework consists of two components: application emulators and a suite of sim-ulators. Application emulators provide a parameterized model of data access and computation patterns of the applications and enable changing critical application components...
متن کاملHeuristic approach to solve hybrid flow shop scheduling problem with unrelated parallel machines
In hybrid flow shop scheduling problem (HFS) with unrelated parallel machines, a set of n jobs are processed on k machines. A mixed integer linear programming (MILP) model for the HFS scheduling problems with unrelated parallel machines has been proposed to minimize the maximum completion time (makespan). Since the problem is shown to be NP-complete, it is necessary to use heuristic methods to ...
متن کاملPerformance Prediction for Data Intensive Applications on Large Scale Parallel Systems
This paper presents a new interactive performance estimation tool – PetaSIM for large scale parallel systems. Our main approach is to divide the difficult performance estimation problem into three domains: application, software and hardware, to extract the system specifications and provide tools for the interactive changes of the system parameters over the Internet. Computers, networks and appl...
متن کاملPerformance Prediction and Ranking of Supercomputers
Performance prediction asks how much time executing an application is likely to take on a particular machine. Machine ranking asks which of a set of machines is likely to execute an application most quickly. These two questions are discussed within the context of large parallel applications run on supercomputers. Different techniques are surveyed, including a framework for a general approach th...
متن کاملAchieving High Performance on Extremely Large Parallel Machines: Performance Prediction and Load Balancing
Parallel machines with an extremely large number of processors (at least tens of thousands processors) are now in operation. For example, the IBM BlueGene/L machine with 128K processors is currently being deployed. It is going to be a significant challenge for application developers to write parallel programs in order to exploit the enormous compute power available and manually scale their appl...
متن کامل